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ABSTRACT 


In  Software  Defined  Radios  a  good  portion  (or  even  the  entirety)  of  the 
modulation  and  demodulation  processes  is  performed  in  the  digital  domain.  The  data  rate 
of  the  transmitted  information  is  very  important,  since  efficiency  is  a  key  requirement  in 
real  time  implementations  and  cost  increases  considerably  with  the  number  of  samples 
per  second  to  be  processed.  In  this  thesis,  we  address  the  problem  of  efficient  design  of 
the  resampling  operations,  so  that  they  can  be  implemented  on  Field  Programmable  Gate 
Arrays  (FPGAs). 

A  set  of  filtering  and  resampling  operations  is  developed  in  the  Simulink 
environment  through  Xilinx/Simulink  blocksets,  where  all  the  included  subsystems  of  the 
design  are  fully  accessible  by  the  designer  in  any  stage  of  operation.  The  key  ingredient  is 
the  use  of  a  Multiplier  and  Accumulator  (MAC)  architecture,  which  can  be  either  time 
multiplexed  for  maximum  hardware  efficiency,  or  run  on  a  parallel  structure  for 
maximum  time  efficiency. 
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EXECUTIVE  SUMMARY 


In  Software  Defined  Radios  (SDR)  a  good  portion  (or  even  the  entirety)  of  the 
modulation  and  demodulation  process  is  performed  in  the  digital  domain.  The 
reconfigurability  and  the  versatility  of  the  SDR  can  be  efficiently  supported  by  the  Field 
Programmable  Gate  Arrays  (FPGAs)  for  hardware  implementations. 

FPGAs  are  high  performance  integrated  circuits  suitable  for  many  Digital  Signal 
Processing  (DSP)  applications  with  the  feature  of  being  reprogrammable  by  the  designer. 
In  this  way,  the  system  can  be  easily  reconfigured  to  a  number  of  different  applications. 

The  proper  software  needed  to  program  an  FPGA  is  provided  by  System 
Generator  (Sysgen),  which  is  an  FPGA  design  program  responsible  for  driving  the  FPGA 
through  the  high-level  design  environment  of  Simulink.  A  combination  of  common  and 
synthesized  Simulink/Xilinx  blocks  from  the  Simulink  library  along  with  MATLAB 
codes  have  been  used  in  order  to  construct  a  configurable  scheme  capable  of 
implementing  the  following  three  operations: 

a)  Finite  Impulse  Response  (FIR)  filter 

b)  Decimation  by  an  integer  factor 

c)  Interpolation  by  an  integer  factor 

The  key  ingredient  is  the  use  of  the  Multiplier  and  Accumulator  (MAC) 
architecture,  which  can  be  either  time  multiplexed  for  maximum  hardware  efficiency,  or 
embedded  on  a  parallel  structure  for  maximum  time  efficiency. 

The  main  components  of  the  implementation  are  the  Dual  Port  Ram  Xilinx  block, 
which  is  a  random  access  memory  containing  both  data  and  the  FIR  filter  coefficients, 
together  with  the  DSP48  Xilinx  block,  which  performs  the  multiplication  and  addition  on 
a  sequential  basis.  The  DSP48  block  is  specifically  designed  for  high-speed  arithmetic 
operations  and  it  is  part  of  the  standard  Xilinx  Virtex  family  architecture.  The  objective  is 
to  perform  the  proper  arrangement  of  the  input  data  and  FIR  filter  coefficients  so  that  the 
resulting  multiplication  and  accumulation  will  perform  the  three  examined  operations 


xi 


according  to  the  theoretical  fonnulations.  Since  the  operations  are  perfonned  serially,  the 
data  need  to  be  upsampled  in  order  to  handle  the  increased  clock  rate  provided  by  System 
Generator  (Sysgen)  and  then  properly  downsampled. 

In  this  research  we  have  shown  that  for  all  three  cases  (FIR  filter,  Decimation, 
Interpolation)  the  overall  structure  is  the  same.  What  defines  each  operation  is  the  control 
logic  (Controller)  and  the  storing  of  the  filter  parameters. 

The  controller  consists  of  logic  blocks  from  the  Xilinx  blockset  and  it  is 
responsible  for  updating  the  Dual  Port  Ram’s  memory  vectors  (according  to  Sysgen  clock 
rate)  in  order  to  provide  the  proper  dual  sequential  output.  The  dual  output  of  the  memory 
block  is  multiplied  and  accumulated  by  DSP48  math  slice.  The  outcome  of  the  DSP48  is 
a  bitstream  in  which  the  desired  coefficient  of  the  three  examined  operations  are 
embedded  accordingly  in  multiple  of  the  Sysgen  rate.  Therefore,  the  final  output  can  be 
obtained  by  downsampling  the  output  of  DSP48  with  the  proper  factor. 

MATLAB  was  used  to  verify  the  consistency  of  the  simulation  with  the  theory. 
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I.  INTRODUCTION 


A.  BACKROUND 

In  Software  Defined  Radio,  the  modulation  and  demodulation  processes  are 
performed  in  the  digital  domain.  The  data  rate  of  the  transmitted  signal  is  usually  several 
orders  of  magnitudes  smaller  than  the  data  rate  necessary  to  drive  the  Digital  to  Analog 
Converters  (DACs)  at  the  radio  frequency  (RF).  In  real  time  implementations,  since  the 
cost  increases  according  to  the  number  of  samples  per  second,  we  need  to  adapt  the 
sampling  rate  to  the  frequency  content  of  the  transmitted  signal.  Therefore,  signals  at 
radio  frequency  (RF)  are  sampled  at  a  rate  comparable  to  the  RF  frequency,  while  the 
signals  at  baseband  are  sampled  at  the  information  rate  [1].  The  reconfigurability  and  the 
versatility  of  the  SDR  can  be  efficiently  supported  by  the  Field  Programmable  Gate 
Arrays  (FPGAs)  for  hardware  implementations. 

1.  FPGA  for  Digital  Signal  Processing 

The  Field  Programmable  Gate  Array  (FPGA)  is  a  high  performance  integrated 
circuit  suitable  for  Digital  Signal  Processing  (DSP)  applications.  An  FPGA  has  the 
feature  of  being  programmable  by  the  designer  and  it  can  be  easily  reprogrammed. 
Physically,  an  FPGA  is  a  two-dimensional  array  of  gates  consisting  of  various  logic  DSP 
blocks  and  interconnections  between  them  in  order  to  perform  DSP  operations  [2], 

Figure  1  shows  a  Virtex-4  FPGA  embedded  in  a  processing  board.  Figure  2  shows 
a  number  of  important  features  such  as  the  array  of  ‘slices’  disposed  in  columns  of 
macroblocks.  The  latter  are  blocks,  constituted  of  memory  and  arithmetic  units  that  are 
programmed  to  perform  suitable  operations.  The  entire  interconnected  mesh  can  be 
programmed  into  highly  parallel  algorithms  [2]. 
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Figure  1 .  Actual  view  of  FPGA  VIRTEX  4  (From:  [4]). 


Figure  2.  Physical  view  of  FPGA  VIRTEX-4  (From:  [2]). 
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2.  Design  Enviroment 

The  Xilinx  DSP  blockset  is  a  suitable  tool  for  designing  FPGA  algorithms  in  the 
Mathworks  Simulink  design  environment.  This  is  supported  by  the  System  Generator 
(Sysgen),  which  is  a  FPGA  design  program  responsible  for  driving  the  FPGA  through  the 
high-level  design  environment  of  Simulink.  A  sufficient  number  of  common  and  complex 
blocks,  which  are  provided  from  several  blocksets  (including  the  Xilinx  blockset)  of  the 
Simulink  Library,  are  properly  synthesized  in  order  to  design  various  DSP  applications 
[5].  Figure  3  shows  on  the  left  the  Simulink  Library  Browser  with  various  basic  elements 
of  the  Xilinx  blockset,  and,  on  the  right  of  the  same  figure,  a  simple  application  in 
Simulink  using  Sysgen.  Specifically,  an  input  data  sequence  is  loaded  from  MATLAB’s 
workspace  and  upsampled  by  a  factor  of  two.  The  output  is  shown  on  the  ‘Scope’  by 
double  clicking  the  corresponding  icon.  Both  ‘in’  and  ‘out’  blocks  are  the  interfaces  of 
common  Simulink  blocks  with  the  Xilinx  blockset.  The  entire  system  is  controlled  by  the 
Sysgen  block.  The  specified  parameters  of  all  blocks  can  be  modified  by  the  user  when 
the  respective  icon  is  selected. 


Figure  3 .  Simulink  environment  using  Xilinx. 
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B. 


OBJECTIVE 


In  this  thesis,  we  address  the  problem  of  efficient  design  of  resampling  operations 
so  they  can  be  implemented  on  Field  Programmable  Gate  Arrays  (FPGAs).  The  key 
ingredient  is  the  use  of  a  Multiplier  and  Accumulator  (MAC)  architecture,  which  will 
allow  us  to  perform  the  following  operations: 

1)  Finite  Impulse  Response  (FIR)  filters 

2)  Decimation  by  an  integer  factor 

3)  Interpolation  by  an  integer  factor 

The  outcome  of  these  three  schemes  is  the  development  of  a  set  of  filtering  and 
resampling  operations  performed  in  Xilinx/Simulink.  All  the  subsystems  in  the  designs 
are  fully  accessible  by  the  designer. 

In  order  to  perform  the  three  operations  (FIR  filtering,  Decimation  and 
Interpolation  by  an  integer  factor),  a  basic  design  scheme  in  the  Simulink  environment  is 
used  and  is  modified  accordingly  to  fit  the  three  cases.  Since  the  objective  is  to  develop 
software  suitable  to  programming  FPGAs,  a  combination  of  Xilinx  and  Simulink  blocks 
as  well  as  MATLAB  codes  is  used.  Figure  4  illustrates  the  basic  structure  of  the 
Simulation. 


Figure  4.  Basic  Structure  of  Simulation. 
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In  each  of  the  three  designs,  the  proper  arrangement  of  the  input  data  points  and 
Finite  Impulse  Response  (FIR)  filter  coefficients  is  achieved  in  the  Dual  Port  Ram  Xilinx 
block,  which  is  a  random  access  memory.  The  dual  output  of  the  memory  block  is 
multiplied  and  accumulated  by  the  DSP48  Xilinx  block,  which  is  an  efficient  block  for 
DSP  operations  implementing  a  Multiplier  and  Accumulator  (MAC)  operation.  From  the 
resulting  output,  we  selectively  extract  the  data  points  of  interest  according  to  the 
theoretical  formulas  of  the  three  desired  operations.  Although  several  Xilinx/Simulink 
blocks  are  used  and  are  explained  in  the  next  chapters,  the  principal  blocks  are  the  Dual 
Port  Ram  and  the  DSP48. 

1.  Efficient  Use  of  the  Dual  Port  Ram  and  DSP48  Xilinx  Blocks 

The  Dual  Port  Ram  Xilinx  block  is  a  dual  memory  device  that  allows  the  user  to 
specify  the  width  and  the  values  of  each  memory  part.  This  specific  block  uses  two  sets 
of  ports  dedicated  to  reading  and  writing  of  data.  Each  port  has  three  inputs:  (a)  the 
address  line  ‘addr’,  (b)  the  input  data  ‘din’  and  (c)  the  write  enable  ‘we’.  In  addition,  each 
port  has  one  output.  There  is  also  an  option  of  additional  enable  and  synchronous  reset 
inputs  for  both  ports  that  were  not  necessary  for  the  purpose  of  this  design.  The  Dual  Port 
Ram  Xilinx  block,  along  with  its  specified  parameter  window,  is  shown  in  Figure  5. 


Q  Dual  Port  RAM  (Xilinx  Dual  Port  Random  Access  Memory)  B  0® 
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□  Provide  enable  port  for  port  A 

□  Provide  enable  port  for  port  B 
Latency  1 

|  OK  1  1  Caned  |  Hdp  |  |  Apply  | 


>  addra 

>  dina 

>  we  a 

>  addrb 

>  dinb 

>  web 


B  > 


Dual  Port  RAM 


Figure  5.  Dual  Port  Ram  Xilinx  Block. 
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Both  memories  are  accessible  for  reading  and  writing  by  providing  the  right 
address  from  the  address  ports  ‘addra’  and  ’addrb’.  The  initial  value  vector,  as  it  is 
indicated  in  the  parameter  window  of  Figure  5,  is  the  concatenation  of  the  two  initial 
vectors  (initial  data  vector  x0  and  initial  FIR  filter  coefficients  h  ).  The  ‘wea’  and  ‘web’ 

are  the  write  enable  ports  for  each  memory  feeding  the  Dual  Port  Ram  with  a  Boolean 
signal  ‘0’  or  ‘l’.When  the  ‘we’  port  is  set  to  1  then  the  memory  writes  the  value  of  the 
‘din’  port  to  the  location  specified  by  the  corresponding  address  line.  Each  of  the  two 
outputs  depend  on  the  write  mode,  which  in  our  case  is  ‘read  after  write’,  and  it  takes 
exactly  the  same  value  indicted  by  the  address  line  when  the  write  cycle  is  completed  [5]. 

For  the  purposes  of  this  thesis,  the  second  part  of  the  memory  remains  unchanged 
(no  input  data)  and  keeps  the  initial  value.  Specifically,  input  b  takes  the  values  of 
properly  ordered  (according  to  the  case  of  interest)  finite  impulse  response  (FIR)  filter 
coefficients,  which  are  generated  in  the  initialization  of  the  simulation  through  any 
MATLAB  function  such  as  ‘firpm’.  Therefore  ports  ‘dinb’  and  ‘web’  are  fed  with  a 
signed  and  a  boolean  zero  respectively.  On  the  other  hand,  the  first  part  of  the  memory 
changes  according  to  ‘address’  and  ‘write  enable’  ports. 

The  outputs  of  ports  A  and  B  are  two  signed  bit  streams:  one  for  the  input  data 
points  and  one  for  coefficients  of  the  FIR  filter,  aligned  in  such  a  way  so  that  their 
multiplication  and  accumulation  will  provide  us  the  desired  result  for  the  three  examined 
cases. 

The  DSP48  Xilinx  block  (also  referred  as  an  extreme  DSP  slice  or  DSP48  math 
slice)  is  an  efficient  tool  for  many  DSP  applications,  which  can  handle  dynamically  many 
operations  as  well  as  be  cascaded  with  other  DSP48  blocks.  It  consists  of  an  1 8-bit-by- 
1 8-bit  signed  multiplier  with  a  48-bit  adder  and  a  programmable  multiplexer  that  can  be 
driven  as  required  to  perfonn  specific  operations  [3].  The  logic  circuit  of  the  slice  is 
depicted  in  Figure  6,  while  the  corresponding  Xilinx  block  along  with  some  capable 
operations  is  shown  in  Figure  7. 
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Figure  6.  DSP48  Slice  (From:  [6]). 


Cpersticn  rrcde 


The  use  of  this  block  for  DSP48  instructions  is  deprecated.  Please  use  the 
Opmode  block. 


D5P48  operation  P=P+(A*B) 


Operation  select  P  +  A*B 


Figure  7.  DSP48  Xilinx  Block. 
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In  this  thesis  the  DSP48  is  used  as  a  multiplier  and  accumulator  (MAC)  block  and 
its  operation  is  defined  as  P  =  P  +  A-  B .  With  this  block  the  product  of  two  inputs  A  and 
B  (derived  from  the  Dual  Port  Ram)  is  accumulated  each  time  with  the  previous  product 
P .  A  reset  port  is  available  to  the  slice  in  order  to  reset  the  output  every  clock  cycle  to 
produce  the  desired  for  each  examined  case  operation. 

C.  RELATED  WORK 

Although  a  number  of  approaches  to  FIR  filtering  and  resampling  operations 
design  exist  in  literature  ([10],  [11]),  to  the  best  knowledge  of  the  author  there  has  been 
no  systematic  way  of  designing  these  filters  in  a  general  fashion. 

The  main  contribution  of  this  resurch  is  an  architecture,  which  is  fully  scalable  to 
any  implementation  in  terms  of  filter  coefficients  and  resampling  factor. 


8 


II.  FINITE  IMPULSE  RESPONSE  FILTER  WITH  ONE  MAC 
(MULTIPLIER  ACCUMULATOR) 


A.  THEORETICAL  PERSPECTIVE 

In  the  digital  domain,  the  output  sequence  y\n]  of  a  Finite  Impulse  Response 
(FIR)  filter  is  given  by  the  following  expression: 

y[n\  =  YJh[k]-x[n-k],  (2.1) 

k= 0 

where  h[n]  is  the  impulse  response  of  the  filter,  x[n]  is  the  input  sequence  and  N  being 
the  degree  of  the  transfer  function  of  the  FIR  filter. 

Both x\n\  and  y\n]  are  at  the  same  clock  rate  Fx  =F  =  Fs  as  x[n]  =  x(nTs )  and 
y\n\  =  y(nTs),  where  Ts=—  is  the  sampling  interval  [7].  The  discrete  convolution, 

Fs 

along  with  its  graphical  representation,  is  depicted  in  Figure  8. 


Figure  8.  Discrete  Convolution. 
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We  can  verify  from  Figure  8  that  the  convolution  operation  can  be  graphically 
implemented  as  a  sliding  window  over  a  data  sequence.  In  particular,  at  any  time  n  we 
need  to  save  N +  1  data  points  x[n],x[« -l],...,x[«  -  N]  together  with  the 

coefficients  h  [0],  h  [l],.. h  [A]  . 

In  this  chapter,  we  address  the  problem  of  implementing  the  filtering  operation 
using  one  Multiplier  and  Accumulator  (MAC).  In  this  way,  the  convolution  sum  is 
computed  in  about  N  clock  pulses  (where  N  denotes  the  degree  of  the  transfer  function 
of  the  FIR  filter),  thus  requiring  a  higher  clock  rate  to  be  provided  by  the  System 
Generator,  which  controls  the  operation  and  its  parameters.  The  objective  is  to  perform 
the  proper  arrangement  of  the  input  data  points  and  the  filter’s  coefficients  so  that  the 
multiplication  and  accumulation  procedure  as  well  as  the  selective  extraction  of  outcomes 
will  give  us  the  desired  convolution  result  in  the  most  efficient  way. 

B.  SOFTWARE  IMPLEMENTATION 

The  Simulink/Xilinx  implementation  needed  to  perform  the  FIR  filtering  is  shown 
in  Figure  9.  The  main  components  of  the  implementation  are  the  Dual  Port  Ram,  which 
contains  both  data  and  the  FIR  filter  coefficients  and  the  DSP48,  which  performs  the 
multiplication  and  addition  on  a  sequential  basis.  Since  the  operations  are  perfonned 
serially,  the  data  need  to  be  upsampled  in  order  to  handle  the  increase  of  the  clock  rate 
provided  by  System  Generator.  The  controller  consists  of  a  set  of  counters  (one  for  the 
coefficients  and  one  for  the  data  points)  along  with  logic  blocks  (implemented  in  Xilinx 
blockset),  and  controls  the  flow  of  the  data  at  the  output  of  the  dual  Port  Ram  as  well  as 
the  timing  of  the  operations. 
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Figure  9.  Finite  Impulse  Response  Filter  with  One  MAC. 

In  order  to  test  the  performance  of  the  filter,  a  Gaussian  white  noise  and  a 
sinusoidal  signal  are  selectively  available  (by  a  manual  switch)  as  inputs.  The  input  signal 
is  sampled  at  rate  Fs ,  while  System  Generator  (Sysgen)  works  at  a  higher  sampling  rate 

equal  to  ( iV  +  1 )  F  .  Since  the  new  system  rate  provided  by  Sysgen  is  higher,  the  input 
data  is  upsampled  by  the  integer  factor  of  N  + 1  with  the  corresponding  Xilinx  block. 

The  objective  is  to  achieve  a  proper  alignment  of  the  data  and  filter’s  coefficients, 
so  that  they  can  be  applied  to  a  MAC  resulting  in  the  convolution  operation.  Towards  this 
goal,  we  need  two  memory  vectors  x  and  h  containing  the  data  and  the  filter  coefficients 
respectively  provided  by  the  Dual  Port  Ram  and  a  MAC  provided  by  the  DSP48. 

1.  Control  Logic  for  Data  and  Filter  Coefficients 

The  vector  h  of  the  filter  coefficients  is  defined  as 

h  =  [/z[0],/z[l],/?[2]...,/z[A-l],0]  .  It  has  length  N  +  I  and  it  remains  unchanged  during 
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the  operation  of  the  filter.  Therefore,  the  ports  ‘dinb’  (data  input  b)  and  ‘web’  (write 
enable  b)  are  set  to  false.  The  first  N  coefficients  of  the  vector  h  are  generated  in 

MATLAB  as  an  FIR  filter  using  function  ‘firpm’,  while  the  additional  (A  +  l)7' 

coefficient  is  intentionally  set  to  zero  in  order  to  serve  computational  issues  derived  from 
the  use  of  the  DSP48,  which  works  as  a  MAC  and  will  be  explained  in  the  MAC 
procedure. 

The  input  data  vector  stored  in  the  first  part  of  the  memory  of  the  Dual  Port  Ram 
is  a  circular  shift  register  of  length  N,  updated  at  times  t  =  nTs  by 

*[(»)„]  <—  x[n],  t  =  nTs,  with  (n)N  =  0,1,..., N-l  denoting  modulo  operation.  In  the 

implementation,  ( n)N  is  a  periodic  counter  with  update  rate  Fac  =  ( N  +  1 )  Fs  .  The  initial 

value  of  the  memory  vector  x  is  set  to  the  initial  conditions  (say  zero  for  example)  and 
updates  its  value  according  to  the  corresponding  ‘address’  and  ‘write  enable’  ports 
provided  from  the  controller.  Figure  10  illustrates  the  controller  of  this  design. 
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Figure  10.  Controller. 

The  time  representation  of  ‘data  address’  and  ‘coefficients  address’  sequences  of 
Figure  10  is  shown  in  Figure  11.  In  particular,  at  time  (n-l)Ts  the  accumulator  is 

initialized  by  a((n-l)Ts)  =  0  (where  ‘a  ’  denotes  the  content  of  the  accumulation).  At 
every  subsequent  clock  cycle  Tac  =  — —  the  accumulation  will  be  updated  as 

Fac 

a((n-l)r?  +ATac')  =  a((n-l)7’s  +[A-\)Tac^+(data  _addr[X)*coejf  _addr[X^ , 

where  A  =  1,...,  N  .  The  output  y\n\  at  time  nTs  is  shown  in  the  timing  diagram  of  Figure 
11. 
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Timing  Diagram 


input  data  x  n-l] 


input  data  x  n 


Data  addr  (n-l)N  (n-N^.  .  .  (n-k^.  .  .  (n-l)N  (n) 
Coeff  addr  0  N  k  ...  1  0 


N 


* — T  — ► 

■‘-ac 

time 

n-l 


n 


output  dt 

itay[n-l] 

Ts={N+\)Tac 


output  c 

atay[n] 

Figure  1 1 .  Time  Representation  of  Simulation. 

In  what  follows  we  demonstrate  the  functionality  of  the  design  according  to  the 
timing  diagram  illustrated  in  Figure  1 1 . 

2.  Alignment  of  Data  and  Filter  Coefficients  in  the  Dual  Port  Ram 

The  length  of  the  vector  x  is  chosen  to  be  one  less  than  the  length  of  h  so  that 
the  writing  procedure  will  introduce  a  shift  by  a  factor  of  one  in  the  content  of  memory 
x .  It  can  be  inferred  that  the  outcome  of  the  Dual  Port  Ram  is  a  set  of  bitstreams,  where 
the  output  at  port  A  is  a  recurrent  window  of  length  N  + 1  (in  every  Ts )  in  which  the 

input  data  is  progressively  shifted  by  one  position  from  left  to  right,  while  the  bitstream 
of  port  B  is  a  repetition  of  the  vector  h .  Figure  12  illustrates  the  outcome  of  the  Dual 
Port  Ram  with  time  running  from  right  to  left. 
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Initial  memory  vector  [x,h]=  0,...,0,/7[0],/7[l],...,/j[Ar-l],0 
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8 >  ...h[0],h[l],...,h[N-l],0  h[0],h[l],...,h[N-l],0 


N+1 


N+1 


Dual  Port  RAM 


Figure  12.  Outcome  of  Dual  Port  Ram. 

3.  Sequential  Multiplication  and  Accumulation  (MAC)  of  Data  and 
Filter  Coefficients  using  DSP48 

The  output  bitstream  from  the  Dual  Port  Ram,  as  shown  in  Figure  12,  is  being 
processed  by  the  DSP48  Xilinx  block,  which  works  as  a  MAC.  Its  operation  mode  is 
defined  as  P  =  P  +  A-  B  (referring  to  Figure  7)  where  the  product  of  two  pairs  of  the  Dual 
Port  Ram  output  ports  A  and  B  is  being  accumulated  each  time  with  the  previous 
product.  A  reset  signal  (selected  from  the  DSP48  options)  for  the  outcome  P  is 
introduced  at  clock  rate  N  + 1  provided  from  the  properly  delayed  ‘write  enable’  signal 
of  the  controller  of  the  Dual  Port  Ram  (referring  to  Figure  9).  The  adjustment  of  the  delay 
is  set  so  that  the  reset  of  the  outcome  P  occurs  every  N  + 1  times,  where  a  data 
coefficient  is  multiplied  with  the  zero  coefficient  of  vector  h  . 
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Consequently,  considering  the  length  (N  +  l)  of  block  pairs  in  Figure  12, 
whenever  a  product  x[k]  •  /z  [0]  (with  k  arbitrarily  chosen)  is  accumulated  to  the  previous 

N  sums  of  products  of  each  block  pairs,  a  data  point  of  the  convolution  y\n]  is 
produced  as  shown  in  the  timing  diagram  (Figure  11).  For  illustration,  Figure  13  shows 
the  first  two  points  y[0],  y[l]  computed  at  times  ( /V  + 1 )  Tac  and  2  ( /V  +  1 )  F c , 
respectively,  by  the  first  two  sets  of  blocks. 


N+1 


N+1 


,  x[l],x[C 

],  0  ,  0,0, 

X 

0],  0  0  ,  0 

X 

X 

X 

X 

X 

X 

X 

X  X 

h[0],h[l],h[2],...,h[N-l],0 


N+1 


After  accumulation  of  all  N+1 
products  of  pairs 


°]’h[l]’-’h 


N-1],0 


N+1 


After  accumulation  of  all  N+1 
products  of  pairs 


c[0]xh[l]+x[l]xh[0]  =  y[l]  x[0]xh[0]  =  y[0] 


Figure  13.  Outcome  of  DSP48. 

Referring  to  figure,  the  bitstream  outcome  P  of  the  DSP48  can  be  considered  as  a 
set  of  blocks  of  length  N  + 1  in  which  the  desired  convolution  coefficients  are  embedded 

in  every  (iV  +  \  f‘  element  of  each  block  as  it  shown  in  the  timing  diagram  in  Figure  11. 

Therefore,  by  downsampling  the  data  P  by  the  factor  of  N  + 1  (same  factor  that  was 
used  when  the  input  data  was  upsampled)  the  desired  convolution  result  is  provided. 
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C.  RESULTS 

In  order  to  test  the  perfonnance,  an  FIR  filter  was  designed  and  tested  with  two 
classes  of  input  signals.  In  particular  the  FIR  filter  has  been  designed  as  an  Equiripple 
Filter  with  the  following  characteristics: 

Passband:  0-0.2  (in  terms  of  Digital  Frequency  /) 

Stopband:  0.3-0. 5  (in  terms  of  Digital  Frequency  /  ) 

Order:  60 

The  signals  tested  are  a  sinusoid  and  a  white  noise.  The  sinusoid  has  frequency 
F  =  0. 1  ■  Fs  (Hz)  with  sampling  frequency  Fs  =  1 0000  (Hz)  and  /  =  F/Fs,  while  the 
white  noise  is  sampled  at  the  same  rate. 

The  frequency  spectrum  of  the  original  signal  and  the  resulting  filtered  signal  for 
the  sinusoidal  case  is  shown  in  Figure  14.  We  can  verify  that  the  frequency  spectrum  of 
the  original  signal  remains  the  same  as  long  as  its  frequency  is  within  the  passband  of  the 
FIR  filter. 
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Figure  14.  Frequency  Spectrum  of  the  Original  and  Filtered  Signal  (Sinusoidal  Case). 

For  the  Gaussian  white  noise  case  the  corresponding  frequency  spectrum,  along 
with  the  frequency  spectrum  of  the  fdtered  signal,  is  depicted  in  Figure  15.  We  can 
observe  that  the  frequencies  of  the  Gaussian  white  noise  are  spread  all  over  the  frequency 
spectrum  while  the  frequency  spectrum  of  the  corresponding  filtered  signal  maintains  the 
frequencies  that  are  within  the  passband  of  the  FIR  filter  and  eliminates  all  the  others. 
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Figure  15.  Frequency  Spectrum  of  the  Original  and  Filtered  Signal  (Gaussian  White 

Noise  Case). 
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III.  DECIMATION  BY  AN  INTEGER  FACTOR 


A.  THEORETICAL  PERSPECTIVE 

1  Sampling  Continuous  Time  Signals 

It  is  well  known  that  by  the  sampling  theorem,  the  sampling  frequency  Fs  has  to 
be  at  least  twice  the  signal  bandwidth  B  [7].  The  Discrete  Time  Fourier  Transform  of  a 
sampled  signalx[«]  with  actual  frequency  content  F ,  which  is  sampled  at  rate  Fs,  is 
given  by  the  following  expression: 

+QO 

X(f)=DTFT{x[n~\}  =  £  *[«]e-'2"\  (3.1) 

i=— oo 

F 

where  /  is  a  dimensionless  quantity  denoting  the  digital  frequency  /  =  — .  From 

F 

S 

equation  (3.1)  we  can  verify  that  X ( /)  is  periodic  with  period  one  since 

+oo  +oo 

x(f  +  1)=  £  £  x[n]e-i1’*=X(f). 

n = —co  n=- oo 

Therefore,  the  infonnation  is  contained  in  one  period  (within  the  interval 
— 1/2< / <l/2)  of  the  periodic  repetition  of  the  frequency  spectrum.  Figure  16 
illustrates  the  frequency  spectrum  of  a  continuous  time  and  sampled  signal  respectively. 
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Figure  16.  Sampling  Continuous  Time  Signals. 

2.  Analysis  of  Downsampling  (Decimation) 

In  digital  communications  such  as  Software  Defined  Radio,  the  exchange  of 
information  needs  to  be  done  in  the  most  efficient  way,  in  order  to  reduce  complexity  and 
improve  efficiency  while  preserving  the  content  of  the  information.  The  Downsampling 
operation  (Decimation)  decreases  the  number  of  samples  per  second  of  a  given  signal  by 
an  integer  factor  of  D  .  An  example  of  decimation  by  integer  factor  of  D  =  3  is  shown  in 
Figure  17. 
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Downsampling  by  an  integer  factor  D=3 


x[n] 


In  every  D=3  samples  of 
x[n]  we  keep  one  sample 


Figure  17.  Downsampling  Operation. 

Consequently,  the  decimation  procedure  introduces  a  loss  of  information  due  to 
the  elimination  of  some  data  points,  so  we  need  to  be  careful  in  order  to  preserve  the 
necessary  information  of  the  signal.  Distortion  of  a  signal  caused  by  the  downsampling 
operation  is  in  terms  of  additional  frequency  components  in  the  frequency  spectrum  of 
the  resampled  signal.  This  phenomenon  is  called  aliasing  and  it  is  avoided  by  properly 
filtering  the  signal  before  downsampling  [8]. 

When  a  signal  sampled  at  rate  F  with  frequency  spectrum  7f  ( j\  j  (in  terms  of 

Fs 

digital  frequency)  is  resampled  at  a  lower  sampling  rate  Fs  =  — -  (where  D  is  an 

2  D 

integer),  the  resulting  frequency  spectrum  of  the  resampling  signal  is  given  by  the 
following  expression  [8], 

=±.%X(±.L) 

N£{  {. N  N) 
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(3.2) 


From  equation  (3.2)  it  is  easy  to  show  that  no  aliasing  occurs  if  the  signal  has  no 


ill  1  ( f 

frequencies  above  | which  case  equation  (3.2)  becomes  7(/2)  =  —  X  2 


rA) 

W 


[9].  Figure  18  illustrates  this  concept. 


Figure  18.  Aliasing  Effect  in  Frequency  Spectrum. 


Generally,  in  order  to  efficiently  downsample  a  noisy  signal  by  an  integer  factor 


of  D ,  with  information  frequency  content  within  the  interval 


1  1 


and  without 


v  2D  2D  j 

introducing  aliasing,  it  is  necessary  to  filter  the  signal  first  by  the  appropriate  Low  Pass 
Filter  (LPF).  Therefore,  the  useful  part  of  the  frequency  spectrum  will  be  preserved  from 
aliased  frequencies  caused  by  noise.  Figure  19  illustrates  this,  along  with  the 
specifications  of  the  appropriate  Low  Pass  Filter  (LPF). 
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Figure  19.  Filtering  and  Downsampling  a  Discrete  Signal. 

3.  Efficient  Implementation  of  Decimation  Operation  using  Noble 
Identities  and  Filter’s  Polyphase  Decomposition 

An  efficient  way  of  implementing  filtering  and  downsampling  operations  is  by 
using  the  Noble  identities  and  the  filter’s  polyphase  decomposition.  Since  the  filter  in 
Figure  19  is  operated  at  a  higher  sampling  rate  Fs  ,  it  will  be  desirable  for  the  filter  to  be 

placed  after  the  downsampling  operation,  resulting  in  a  significant  decrease  of  the 
number  of  operations  since  Fs^  <  Fs^ .  It  is  well  known  that  by  the  polyphase 

decomposition  of  the  filter  and  the  Noble  Identities  the  downsampling  operation  can  be 
implemented  as  in  Figure  20.  In  particular,  the  signal  is  buffered  into  D  components  at 
the  lower  sampling  rate  and  each  component  is  filtered  by  the  polyphased  decomposition 
of  the  Low  Pass  Filter  [8]. 
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Figure  20.  Efficient  Implementation  of  Decimation. 

B.  DECIMATION  BY  TWO  WITH  FIR  MAC  AND  POLYPHASE 
DECOMPOSITION 

In  case  of  decimation  by  an  integer  factor  D  =  2  we  can  relate  the  input  and 
output  signal  as 

2N-1 

y[n]  =  J'i  h\k]x\2n  -  k] ,  (3-3) 

k= 0 

where  x[n]  =  x(nTs) ,  and  Ts  is  the  sampling  interval.  Consequently,  the  output  is 
sampled  at  half  the  input  rate. 

The  FIR  filter  polyphase  decomposition  provides  two  components,  one  for  the 
even  samples  /z0  [£]  = /;[2£]  and  one  for  the  odd  samples [&]  =  h  [2£  +  l] .  Therefore, 
equation  (3.3)  can  be  rewritten  as 
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(3.4) 


y[n]  =  ^jh0[k]x[2(n-k)~\  +  YJhi[k]x[2(n-k)-l\, 

k= 0  k= 0 

which  breaks  down  the  computation  into  two  phases  associated  to  the  even  and  odd 
samples,  respectively.  Equation  (3.4)  can  be  rewritten  as 

7V-1  N- 1 

y\n\  =  h0  [0]x[2«]  +  ^ hQ  [k]x[2(«  -£)]  +  \  [0]x [2/7  -1]^/^  [k]x[2(7?  - k)-l].  (3.5) 

k= 1  k= 1 

Equation  (3.5)  highlights  the  fact  that,  during  the  time  computational  interval 
(277-2)7;  <t<{ln)Ts  the  data  vector  needs  to  be  updated  with  samples  w: [2/7  —  l]  and 

w [2/7] ,  while  the  data  in  the  two  summations  are  available  before  time  (2/7  —  2)7; . 

1.  Software  Implementation 

The  Simulink/Xilinx  implementation  needed  to  perfonn  the  decimation-by-two 
has  the  same  structure  as  the  model  presented  in  Figure  4  with  modified  parameters  to 
match  this  case.  Specifically,  the  initial  values  of  the  vectors  of  the  Dual  Port  Ram  along 
with  the  controller  (logic  circuit  responsible  for  arranging  data  points  and  FIR  filter’s 
coefficients)  are  changed  in  order  to  implement  equation  (3.5).  Furthermore,  the  input 
data  is  upsampled  at  a  rate  equal  to  the  System  Generator’s  clock  rate  and  the  outcome  is 
downsampled  twice  the  Sysgen  rate,  implementing  the  decimation-by-two  operation. 
Figure  2 1  illustrates  the  structure  of  this  specific  design. 
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F igure  2 1 .  Downsampling  by  Two . 

In  order  to  test  the  performance  of  the  simulation  a  sinusoidal  signal  is  provided 
as  an  input.  The  input  signal  is  sampled  at  rate  Fs  while  System  Generator  (Sysgen) 
works  at  a  higher  sampling  rate  equal  to  NFS ,  with  2 N  -  2  being  the  degree  of  the 

transfer  function  of  the  FIR  filter  which  is  decomposed  into  its  polyphase  components. 
The  generation  of  the  polyphase  filter  is  accomplished  in  the  initialization  of  the 
simulation.  Since  the  new  system  rate  provided  by  Sysgen  is  higher,  the  input  data  is 
upsambled  by  the  integer  factor  of  N  with  the  corresponding  Xilinx  block. 

The  objective  is  to  achieve  a  proper  alignment  of  the  data  and  filter’s  coefficients 
so  that  they  can  be  applied  to  a  MAC  resulting  in  the  decimation-by-two  operation. 
Towards  this  goal,  we  need  two  memory  vectors  x  and  h ,  containing  the  data  and  the 
filter  coefficients  provided  by  the  Dual  Port  Ram,  and  a  MAC  provided  by  the  DSP48. 
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a.  Control  Logic  for  Data  and  Filter  Coefficients 

The  vector  h  of  the  filter  coefficients  is  defined  as 

h  =  [/z[0],/z[2],...,/z[2iV-2],/z[l],/z[3],...,/z[27V-3],0]  and  it  is  the  concatenation  of 

the  two  polyphase  components  (one  for  the  even  and  one  for  the  odd  samples)  of  a 
2N  - 1  length  FIR  filter  (which  is  generated  in  MATLAB)  with  an  additional  zero  at  the 
end.  The  vector  h  has  total  length  2 N  and  remains  unchanged  during  the  operation  of 
downsampling-by-two.  Therefore  the  ports  ‘dinb’  (data  input  b)  and  ‘web’  (write  enable 
b)  are  set  to  false.  The  last  zero  coefficient  of  vector  h  is  added  in  order  to  serve 
computational  issues  derived  from  the  use  of  DSP48,  which  works  as  a  MAC  and  it  will 
be  explained  in  the  MAC  procedure. 

The  input  data  vector  stored  in  the  first  part  of  the  memory  of  the  Dual 
Port  Ram  is  a  vector  x  of  length  2 N  and  updated  at  times  t  =  n  T.  as 

<—  x[2n]  for  the  even  samples  and  X^[n  ~N)2n  t  J  <—  x[2n  -l]  for  the  odd 
samples,  with  («)2JV_,  =  0. 1,...,  2A  -  2  denoting  modulo  operation.  In  the  implementation, 
{n)2N  i  is  a  periodic  counter  with  update  rate  Fac  =  NFs .  The  initial  value  of  the  memory 

vector  x  is  set  to  the  initial  conditions  (say  zero,  for  example)  and  updates  its  value 
according  to  the  corresponding  ‘address’  and  ‘write  enable’  ports  provided  from  the 
controller.  Figure  22  illustrates  the  structure  of  the  controller. 
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Figure  22.  Controller. 

The  time  representation  of  ‘data  address’  and  ‘coefficients  address’ 
sequences  of  Figure  22  is  shown  in  Figure  23.  In  particular,  at  time  {2n-2)Ts  the 

accumulator  is  initialized  by  a  {{in  -2)7))  =  0  (where  ‘a’  denotes  accumulation 

function).  At  every  subsequent  clock  cycle  Tac  =  — the  accumulation  will  be  updated  by 

Fac 

a{{2n-2)T  +  27).  )  =  a{(  2n  -  2}  T  +{A-\)Tac^+{data  _addr{X)»coejf  _addr{Xfj, 

where  2  =  1,...,2A  .  At  time  {2n-l)Ts  ={2n-2)Ts  +  NTac  the  input  data  is  updated.  The 
output  y  [«]  at  time  2nT.  is  shown  in  the  timing  diagram  of  Figure  23. 
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Timing  Diagram 


input  data  x  2n-2] 

input  data  x  2n-l] 
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F igure  23.  T ime  Representation  of  Simulation. 

In  order  to  demonstrate  the  functionality  of  the  implementation,  Figure  23 
illustrates  the  timing  of  the  various  signals  involved. 

The  outcome  of  the  Dual  Port  Ram  is  a  set  of  bitstreams,  one  from  port  A 
(data  points)  and  one  from  port  B  (filter  coefficients).  It  can  be  inferred  that  the  outcome 
of  port  A  is  a  recurrent  window  of  length  2N ,  which  is  subdivided  into  two  windows 
(one  for  the  even  samples  and  one  for  the  odd  samples  of  input  signal)  of  length  N  .  At 
every  time  Tac  both  the  even  and  the  odd  samples  are  updated,  introducing  a  shift  by  one 

position  from  left  to  right.  The  bitstream  of  port  B  is  a  repetition  of  the  vector  h  .  Figure 
24  illustrates  the  outcome  of  Dual  Port  Ram. 
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Figure  24.  Outcome  of  Dual  Port  Ram. 

b.  Sequential  Multiplication  and  Accumulation  (MAC)  of  Data  and 
Filter  Coefficients  using  the  DSP48 

The  output  bitstream  from  the  Dual  Port  Ram  as  it  is  shown  in  Figure  24  is 
being  processed  by  the  DSP48  Xilinx  block,  which  works  as  a  MAC.  Its  operation  mode 
is  defined  by  P  =  P  +  A- B  (referring  to  Figure  7)  where  the  product  of  two  output  pairs 
A  and  B  of  the  Dual  Port  Ram,  is  being  accumulated  each  time  with  the  previous 
product.  A  reset  signal  (selected  from  the  DSP48  options)  for  the  outcome  P  is 
introduced  at  clock  rate  2 N  provided  from  the  properly  delayed  ‘write  enable  1  ’  signal 
of  the  controller  of  the  Dual  Port  Ram  (referring  to  Figure  22).  The  adjustment  of  the 
delay  is  set  so  that  the  reset  of  the  outcome  P  occurs  every  2  N  times,  where  a  data 
coefficient  is  multiplied  with  the  zero  coefficient  of  vector  h  . 

Consequently,  considering  the  length  (2 N)  of  block  pairs  in  figure  24, 
after  the  last  product  x[k]  •  A  [0]  is  accumulated  to  the  previous  2 N  sums  of  products  of 
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each  block  pair,  a  data  point  of  decimation-by-two  operation  y\n\  is  generated  as  shown 
also  in  the  timing  diagram  (Figure  23).  Figure  25  shows  the  first  two  points  v  [0] ,  y  [l] 
computed  at  times  (2 N)Tac  and(4Ar)fac. 


2N  2W 


x[2]x/i[0]  +  jc[0]xft[2]  +  x[l]  xh[l]  =  _y[l] 


Figure  25.  Outcome  of  DSP48. 

Referring  to  Figure  7,  the  bitstream  outcome  P  of  the  DSP48  can  be 
considered  as  a  set  of  blocks  of  length  2N  in  which  the  desired  coefficients  of  the 

decimation-by-two  operation  are  embedded  in  every  ( 2N )‘h  element  of  each  block  as 

shown  in  the  timing  diagram  in  Figure  23.  Therefore,  by  downsampling  the  data  P  by 
the  factor  of  2N  (twice  the  factor  that  was  used  when  the  input  data  was  upsampled)  the 
desired  decimation-by-two  operation  is  performed. 
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c. 


Results 


In  order  to  test  the  perfonnance  of  the  simulation  a  sinusoidal  waveform 
with  frequency  F  =  0.2  •  Fs  (Hz)  and  sampling  frequency  Fs=  1  (Hz)  is  applied  as  an 
input. 

The  FIR  filter  has  been  designed  as  an  Equiripple  filter  and  decomposed 
into  two  polyphase  components  with  the  following  characteristics: 

Passband:  0-0.3  (in  terms  of  Digital  Frequency  /) 


Stopband:  0.4-0. 5  (in  terms  of  Digital  Frequency  /) 
Order:  65 


The  frequency  spectrum  of  the  original  and  the  downsampled-by-two 
signal  is  shown  in  Figure  26.  We  can  verify  that  the  frequency  spectrum  of  the 
downsampled-by-two  signal  is  stretched  (in  terms  of  the  digital  frequency)  by  the  integer 
factor  of  two  compared  to  the  frequency  spectrum  of  the  original  signal.  Since  the 


bandwidth  of  the  signal  is  less  than  —  there  is  no  aliasing  effect.  Therefore,  the 


frequency  of  the  original  signal  is  /  =  0.2  while  the  frequency  of  the  downsampled-by- 
two  signal  is  /  =  2x 0.2  =  0.4  (where  /  is  the  dimensionless  digital  frequency). 
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Figure  26.  Frequency  Spectrum  of  the  Original  and  Downsampled  by  Two  Signal 

without  Aliasing  Effect. 

In  order  to  demonstrate  the  aliasing  effect  in  the  frequency  spectrum  of  a 
downsampled  signal,  a  sinusoidal  waveform  with  frequency  F  =  0.3- Fs  (Hz)  is  applied 

as  an  input.  Since  the  new  bandwidth  (0.3)  exceeds  the  factor  (where  D  =  2),  the 

new  frequency  spectrum  of  the  downsampled-by-two  signal  in  the  interval  f  -  ~ 

will  contain  aliased  frequencies  derived  from  the  periodic  repetition  of  one  period  of  the 
frequency  spectrum.  Figure  27  illustrates  this  example. 
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Figure  27.  Frequency  Spectrum  of  the  Original  and  Downsampled  by  Two  Signal  with 

Aliasing  Effect. 

C.  DECIMATION  BY  AN  INTEGER  FACTOR  ‘D’  WITH  FIR  MAC  AND 
POLYPHASE  DECOMPOSITION 

The  structure  of  the  decimation-by-two  operation  can  be  easily  extended  to  a 
more  general  decimation-by-  D  operation  for  any  D  >  2  .  The  decimated  signal  obtained 
from  an  input  signal  x\n\,  which  is  filtered  by  a  FIR  filter  h\n]  (decomposed  into  its 
polyphase  components)  and  then  downsampled  by  an  integer  factor  of  D  is  given  by 

DN- 1 

y[n]=  y,  h\k\x\nD -k\,  (3.6) 

k= 0 

with  D  integer  and  x\n\  =  x{nTs ) ,  y \n\  =  y(nDTs )  the  input  and  output  sequences 
sampled  at  rates  Fs  =1/7)  and  FJ D  =  1  / (7)7) )  respectively. 

The  D  polyphase  components  of  the  FIR  filter  are  defined  by 
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ht[k\  =  h[kD  +  i\,  (3.7) 

with  £  =  0,...,  D  - 1  and  k  =  .  The  decimated  output  is  the  superposition  of  the 

D  phases  and  it  is  given  by  the  following  expression: 

y\n\  =  \k\>c\(n-k}D~\  +  'y'j^  [A:]v[(n  -  £)D-l]  +  ... 

k= 0  k= 0 

N-l 

-  +  Zv  ![k]>c[(n-k)D-D  +  l].  (3.8) 

k= 0 

Equation  (3.8)  can  be  further  decomposed  as: 

TV- 1 

v[«]  =  ...  +  he  [0]x[nD-  ^]  +  [A:]x[(n  —  k)D  —  £j  +  ...  (3.9) 

k= 1 

During  the  time  computational  interval  ( Dn-D)TS  <t<( Dn ) Ts  the  data  vector 
needs  to  be  updated  with  samples  x\Dn-(D- 1)]  up  to  x[Dn],  while  the  data  in  the  D 
summations  are  available  before  time  ( Dn  —  D)TS . 

The  design  needed  to  perform  the  decimation-by-  D  operation  is  similar  to  the 
decimation-by-two  case.  The  memory  vector  for  the  input  data  points  in  the  Dual  Port 
Ram  has  length  DN  - 1  and  updates  its  value  by 

X[(n)DN- 1  ]<-*[*£>]> 

x[{n-£N)m^x[nD-£], 


xUn-(D-\)N) 


<—  x\nD  -  D  +  \\. 


The  FIR  filter  coefficients  vector,  which  is  stored  in  the  second  memory  of  the 
Dual  Port  Ram  is  the  concatenation  of  its  polyphase  components  derived  from  expression 
(3.7)  with  total  length  DN  - 1 . 
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In  the  implementation,  («)  j  =0 ,...,DN -2  is  a  periodic  counter  with  update 
rate  Fac  =  NFs ,  which  is  the  clock  rate  of  the  System  Generator.  Therefore  the  input  data 
is  upsampled  by  the  integer  factor  of  N  . 

The  time  representation  of  ‘data  address’  and  ‘coefficients  address’  sequences  are 
shown  in  Figure  28.  In  particular,  at  time  ( Dn-D)Ts  the  accumulator  is  initialized  as 

a (( Dn  -  D)T  \  =  0  (where  ‘  a  ’  denotes  the  accumulation  function).  At  every  subsequent 
clock  cycle  T  =  — —  the  accumulation  will  be  updated  by 

Fac 

a((Dn- D)T  +  ATac  ]  =  a((Dn-D)jT  +(/L-l)r)+(  data  _  addr  (Aycoeff  _  addr  (A)), 


where  A  =  l,...,DN .  The  input  data  is  updated  every  N,h  multiple  of  Tac  with  total 
multiples  DN .  In  particular,  ( Dn  -  D  +  1 )  T  =  ( Dn  -  D)  T.  +  NTm;  .  The  output  y\n\  at 
time  DnTs  is  shown  in  the  timing  diagram  of  Figure  28. 


Figure  28.  Timing  Diagram  for  Decimation  by  D. 
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Apart  from  the  new  vectors  that  are  stored  in  the  Dual  Port  Ram,  this  design  can 
be  obtained  by  simple  extension  of  the  decimation-by-two  case  to  the  more  general 
decimation-by-D . 

Referring  to  Figure  7,  the  bitstream  outcome  P  of  the  DSP48  can  be  considered 
as  a  set  of  blocks  of  length  DN  in  which  the  desired  coefficients  of  the  decimation-by-D 

operation  are  embedded  in  every  (ZW)  *  element  of  each  block  as  shown  in  the  timing 

diagram  in  Figure  28.  Therefore  by  downsampling  the  data  P  by  the  factor  of  DN  (  D 
times  the  factor  which  was  used  when  the  input  data  was  upsampled)  the  desired 
decimation-by-D  operation  is  performed. 

In  order  to  test  the  performance  of  the  simulation  for  the  decimation  factor  D  =  4 
a  sinusoidal  waveform  with  frequency  F  =  0. 1  •  Fs  (Hz)  and  sampling  frequency  F  =  1 
(Hz)  is  applied  as  an  input. 

The  FIR  filter  has  been  designed  as  an  Equiripple  filter  and  decomposed  into  four 
polyphase  components  with  the  following  characteristics: 

Passband:  0-0.2  (in  terms  of  Digital  Frequency  /) 

Stopband:  0.25-0.5  (in  terms  of  Digital  Frequency  /  ) 

Order:  29 

The  frequency  spectrum  of  the  original  and  the  downsampled  by  D  =  4  signal  is 
shown  in  Figure  29.  We  can  verify  that  the  frequency  spectrum  of  the  downsampled 
signal  is  stretched  (in  terms  of  the  digital  frequency)  by  the  integer  factor  of  four 
compared  to  the  frequency  spectrum  of  the  original  signal.  Since  the  initial  bandwidth  of 

the  signal  is  less  than  there  is  no  aliasing  effect.  Therefore,  the  frequency  of  the 

original  signal  is  /  =  0.1,  while  the  frequency  of  the  decimation-by-four  signal  is 
/  =  4  x  0. 1  =  0.4 ,  where  /  is  the  dimensionless  digital  frequency. 
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Figure  29.  Frequency  Spectrum  of  the  Original  and  Downsampled  by  D  =  4  Signal. 


Frequency  Spectrum  of  Original  Signal 

1 - 1 - 1 - I - 1 - ! - 1 - 1 - T 


40 


IV.  INTERPOLATION  BY  AN  INTEGER  FACTOR 


A.  THEORETICAL  PERSPECTIVE 

1.  Analysis  of  Upsampling  (Interpolation) 

In  Software  Defined  Radios  (SDR),  the  modulation  process  is  perfonned  in  the 
digital  domain.  The  data  rate  of  the  transmitted  information  needs  to  be  increased  in  order 
to  match  the  rate  of  the  modulation  (carrier  frequency).  An  upsample  operation 
(interpolation)  increases  the  number  of  samples  per  second  of  a  given  signal  by  an  integer 
factor  D  .  An  example  of  interpolation  by  integer  factor  of'  D  =  3  is  shown  in  Figure  30. 


Upsampling  by  an  integer  factor  D=3 
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Figure  30.  Upsampling  Operation. 
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When  a  signal  sampled  at  a  rate  Fs^  with  frequency  spectrum  X  (  f\ )  (in  terms  of 

digital  frequency)  is  resampled  at  a  higher  rate  FSr>  =  DFS  ,  where  D  is  an  integer,  the 

resulting  frequency  spectrum  of  the  resampled  signal  is  given  by  the  following 
expression: 

Y(A)  =  Y(ft)\f^.  (4.1) 


It  is  obvious  from  equation  (4.1)  that  the  new  frequency  spectrum  is  ‘squeezed’  in 
terms  of  the  digital  frequency  (horizontal  axis)  [8],  Consequently,  since  the  frequency 
spectrum  of  the  resampled  signal  is  a  periodic  repetition  of  one  period  between  the 


interval 


1  1 


additional  image  frequency  components  (‘ghost’  frequencies)  will 


v  2  2, 

appear  in  the  spectrum  of  the  upsampled  signal.  These  frequencies  are  artifacts  created  by 
the  upsampling  operation.  The  frequency  spectra  of  the  original  signal  and  after 
upsampling  by  D  is  shown  in  Figure  3 1 . 
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In  order  to  eliminate  the  ‘ghost’  frequencies  a  Low  Pass  Filter  (LPF)  is  needed 
after  the  upsampling  operation.  The  frequency  response  of  the  LPF  along  with  its 
specifications  is  depicted  in  Figure  32. 


Figure  32.  Upsampling  and  Filtering  with  LPF. 

2.  Efficient  Implementation  of  Interpolation  Operation  using  Noble 
Identities  and  Filter’s  Polyphase  Decomposition 

An  efficient  way  of  implementing  upsampling  and  filtering  operations  is  by  using 
the  Noble  identities  with  the  filter’s  polyphase  decomposition.  Since  the  filter  in  Figure 
32  is  operated  at  a  higher  sampling  rate  Fs  it  would  be  desirable  for  the  filter  to  be 

placed  before  the  upsampling  operation,  thus  minimizing  the  cost.  It  can  be  shown  that 

N 

the  upsampling  operation  shown  in  Figure  32,  with  the  LPF  H (z)  =  '^h(n)z  " ,  can  be 

n= 0 
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implemented  as  shown  in  Figure  33,  where  the  filters  II k  (z) ,  for  k  =  D - 1  are  the 

N/D 

polyphase  components  of  Hk  (z)  =z  h{nD  +  k)z"  [8]. 

n=  0 


The  upsampling  network  on  the  right  of  Figure  33,  after  the  filters,  is  an 
‘interlacer’  that  interlaces  the  outputs  of  all  D  filters,  thus  increasing  the  sampling  rate. 
This  implementation  is  particularly  attractive,  since  it  has  the  same  complexity  as  the 
original  but  is  implemented  at  the  lowest  sampling  rate  [8], 


LPF 


y[m\ 


X 


Interlacer 


Figure  33.  Efficient  Implementation  of  Interpolation. 
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B.  INTERPOLATION  BY  TWO  WITH  FIR  MAC  AND  POLYPHASE 

DECOMPOSITION 

From  the  polyphase  decomposition  the  upsampling  by  two  is  determined  as 

N- 1 

y[2n]  =  YJh0[k]x[n-k], 

k=0  (4.2) 

N- 1  v  7 

V  [2/7  + 1]  =  y,A|  [k]x[«-A:]. 

k= 0 

Here  /z0  [k]  = /z[2k]  and  hx  \k\  =  h[2k  +  l]  are  the  polyphase  components  (even  and  odd 
samples)  of  the  filter  h\n\  while  x[«]  =  x(n7^)  with  Ts  the  sampling  interval. 
Consequently,  the  output  rate  is  twice  the  input  rate. 

Equation  (4.2)  highlights  the  fact  that  the  signal  x[n]  is  interpolated  by 
interlacing  two  signals,  y\2n]  and  v  [2n  + 1] ,  which  are  computed  independently. 

1.  Software  Implementation 

The  Simulink/Xilinx  implementation  needed  to  perfonn  the  interpolation-by-two 
has  the  same  structure  as  the  model  presented  in  Figure  4  with  parameters  properly 
chosen  to  match  the  new  case.  Specifically  the  initial  values  of  the  vectors  of  the  Dual 
Port  Ram  along  with  the  controller  (logic  circuit  responsible  for  arranging  data  points  and 
FIR  filter’s  coefficients)  are  changed  in  order  to  implement  equation  (4.2).  Furthermore, 
the  input  data  is  upsampled  at  a  rate  equal  to  the  System  Generator’s  clock  rate  and  the 
outcome  is  downsampled  at  the  half  of  the  Sysgen  rate,  implementing  the  interpolation- 
by-two  operation.  Figure  34  illustrates  this  design. 
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Figure  34.  Upsampling  by  Two. 

In  order  to  test  the  performance  of  the  simulation,  a  sinusoidal  signal  is  provided 
as  an  input.  The  input  signal  is  sampled  at  rate  Fs ,  while  the  System  Generator  (Sysgen) 

works  at  a  higher  sampling  rate  equal  to  2  ( N  + 1)  Fs ,  with  IN  -2  being  the  degree  of  the 

transfer  function  of  the  FIR  filter  which  is  decomposed  into  its  polyphase  components. 
The  generation  of  the  polyphase  filter  is  accomplished  in  the  initialization  of  the 
simulation.  Since  the  new  system  rate  provided  by  Sysgen  is  higher,  the  input  data  is 
upsambled  by  the  integer  factor  of  2  (A  + 1)  with  the  corresponding  Xilinx  block. 

The  objective  is  to  achieve  a  proper  alignment  of  the  data  and  filter’s  coefficients 
so  that  they  can  be  applied  to  a  MAC  resulting  in  the  interpolation-by-two  operation. 
Towards  this  goal,  we  need  two  memory  vectors  xand  h  containing  the  data  and  the 
filter  coefficients  respectively  provided  by  the  Dual  Port  Ram  and  a  MAC  provided  by 
the  DSP48. 
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a.  Control  Logic  for  Data  and  Filter  Coefficients 
The  vector  h  of  the  filter  coefficients  is  defined  as 
h  =  [/z[0],/z[2],...,/z[27V-2],0,0,/z[l],/z[3],...,/z[2iV-3],0]  and  it  is  the  concatenation 

of  the  two  polyphase  components  (one  for  the  even  and  one  for  the  odd  samples)  of  a 
27V -1  length  FIR  filter  (which  is  generated  in  MATLAB)  with  three  properly  placed 
additional  zeros.  The  length  of  the  vector  h  is  2 N  +  2  and  its  value  remains  unchanged 
during  the  operation  of  upsampling-by-two.  Therefore  the  ports  ‘dinb’  (data  input  b)  and 
‘web’  (write  enable  b)  are  set  to  false.  The  zero  coefficients  are  required  by 
computational  issues  derived  from  the  controller  and  the  use  of  the  DSP48  block  which 
works  as  a  MAC  and  it  will  be  explained  in  the  MAC  procedure. 

The  input  data  vector  stored  in  the  first  part  of  the  memory  of  the  Dual 
Port  Ram  is  a  vector  x  of  length  N  and  it  is  updated  as  x[(«)v ]  <—  x[n] .  The  data 

address  counter  is  defined  as  ( n)  =  0,1,...,  N  - 1  and  it  is  repeated  twice  during  the 

sampling  interval  Ts .  The  initial  value  of  the  memory  vector  x  is  set  to  the  initial 

conditions  (say  zero,  for  example)  and  updates  its  value  according  to  the  corresponding 
‘address’  and  ‘write  enable’  ports  provided  from  the  controller.  Figure  35  illustrates  the 
controller  of  the  simulation. 


47 


Figure  35.  Controller. 

The  time  representation  of  ‘data  address’  and  ‘coefficients  address’ 
sequences  of  Figure  35  are  shown  in  Figure  36.  In  particular,  at  time  (2n-2)F  the 

accumulator  is  initialized  as  a((2n  -2)Ts^  =  0  (where  a  denotes  the  content  of  the 
accumulation).  At  every  subsequent  clock  cycle  Tac  =  — — ,  the  accumulation  will  be 

Fac 

updated  by 

a((2/?-2) T  +  ATac ^)  =  a((2n-2)Ts  +(/l-1):T()+( data _ addr ( X)*coeff  _ addr (2)) ,  and 
a((2«-l)r?  +ATac^  =  a((2«-l)7]  +[A-\)T^+(clata  _addr(X)»coejf  _addr[X)}, 
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with  A,  =  .  The  outputs  y[2«-l],_y[2»]  at  times  ( 2n-l)Ts  and  2 nTs  respectively 

are  shown  in  the  timing  diagram  of  Figure  36.  . 


Timing  Diagram 


input  data  x  n-l] 


input  data  x  n] 


Data  addr  (n-l)N  (n-N)N. . .  (n-l)N  (n-N)N  (n-N)N. . .  (n-l)N(n) 
Coeff  addr  0  2N+1  .  .  .  N+2  N+l  N  1  0 

Phase  0  N  ...1  0  N...  10 


N 


time 

« —  T  — J 

ac 

2n-l 

{■ -  T5-{N+\)Tx - * 

* -  T 

={v+'K — i 

output  data  y  [  2  n-  2 

output  data  y  [  2  n- 1 

output  data  y  [  2  n ] 

Figure  36.  Time  Representation  of  Simulation. 

In  order  to  demonstrate  the  functionality  of  the  implementation,  Figure  36 
illustrates  the  timing  of  the  various  signals  involved. 

The  outcome  of  the  Dual  Port  Ram  is  a  set  of  bitstreams,  one  from  port  A 
(data  points)  and  one  for  port  B  (filter’s  coefficients).  The  bitstream  of  port  B  is  a 
repetition  of  the  vector  h  .  It  can  be  inferred  that  the  outcome  of  port  A  is  a  recurrent 
window  of  length  2N  +  2,  which  is  subdivided  into  two  windows  of  length  N  +  l .  At 
every  time  Tac  both  subwindows  are  updated  with  the  same  data,  introducing  a  shift  by 

one  position  from  left  to  right,  while  the  first  subwindow  starts  updating  from  the  second 
sample.  Figure  37  illustrates  the  outcome  of  the  Dual  Port  Ram. 
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Initial  memory  vector  [x,h]=  [0,...,0,/j[0],/j[2],...,fc[2AT-2],0,0,/j[l],fc[3],...,fc[2A/-3],0 


)  addra 
)  dina 

>  wea 

)  addrb 
)  dinb 
>web 


A> 


N  + 1 

A 


JV  +  1 

A 


N  + 1  N  + 1 

_ 


x[l],  x[0], 0, 0,  x[0], 0,  x[0], 0, 0, . 0 

v _  _ y  y 


2N+2 

-  T  - 


2N+2 


J 


N  + 1 


N  + 1 


B>  h[0],h[2],...,h[2N-2],0,0,h[l],h[3],...,h[2N-3],0 


Dual  Port  RAM 


2N+2 

T  ~ 


Figure  37.  Outcome  of  Dual  Port  Ram. 

b.  Sequential  Multiplication  and  Accumulation  (MAC)  of  Data  and 
Filter  Coefficients  using  DSP48 

The  output  bitstream  from  the  Dual  Port  Ram,  as  it  is  shown  in  Figure  37, 
is  being  processed  by  the  DSP48  Xilinx  block,  which  implements  the  MAC.  Its  operation 
mode  is  defined  by  P  =  P  +  A-  B  (referring  to  Figure  7),  where  the  product  of  two  pairs 
of  Dual  Port  Ram  output  ports  A  and  B  is  being  accumulated  each  time  with  the  previous 
product.  A  reset  signal  (selected  from  the  DSP48  options)  for  the  outcome  P  is 
introduced  at  clock  rate  N  + 1  provided  from  the  properly  delayed  ‘write  enable  1  ’  signal 
of  the  controller  of  the  Dual  Port  Ram  (referring  to  Figure  35).  The  adjustment  of  the 
delay  is  set  so  that  the  reset  of  the  outcome  P  occurs  every  N  + 1  samples,  where  the 
coefficients  of  vector  h  are  zero,  without  affecting  the  accumulation  procedure  of  the 
interpolation-by-two  operation  and  therefore  there  is  no  loss  of  information.  The  third 
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zero  coefficient  of  vector  h  which  is  placed  at  the  first  odd  sample  is  not  affecting  the 
accumulation  process  of  the  interpolation-by-two  operation  as  well,  since  the  first 
subwindow  of  the  data  vector  is  updated  from  the  second  sample. 

Consequently,  considering  the  length  (27V +  2)  of  block  pairs  in  Figure 
37,  whenever  a  product  x[k]-/z[0]  is  accumulated  to  the  previous  27V +  1  sums  of 
products  of  each  block  pair,  two  data  points  of  interpolation-by-two  operation  y[/z]  are 
provided  at  every  ( TV  + 1 )  Tac  interval  as  it  also  shown  in  the  timing  diagram  in  Figure  36. 
Figure  38  shows  the  first  three  points  v[0]  ,y[l]  ,  y  [2]  provided  at  time 
ju(N  +  \)Tac , where  ju  =  1, 2, 3 . 


Figure  38.  Outcome  of  DSP48. 
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Referring  to  Figure  7,  the  bitstream  outcome  P  of  the  DSP48  can  be 
considered  as  a  set  of  blocks  of  length  2N  +  2  in  which  the  desired  coefficients  of  the 

interpolation-by-two  operation  are  embedded  in  every  (TV  + 1  j'"  and  (27V  +  2'f!  element 

of  each  block  as  it  shown  in  the  timing  diagram  in  Figure  36.  Therefore  by  downsampling 
the  data  P  by  the  factor  N  + 1  (half  the  factor  that  was  used  when  the  input  data  was 
upsampled)  the  desired  interpolation-by-two  operation  is  perfonned. 

c.  Results 

In  order  to  test  the  perfonnance  of  the  simulation  a  sinusoidal  waveform 
with  frequency  F  =  0.4- Fs  (Hz)  and  sampling  frequency  Fs  =1  (Hz)  is  applied  as  an 
input. 

The  FIR  filter  has  been  designed  as  an  Equiripple  filter  and  decomposed 
into  two  polyphase  components  with  the  following  characteristics: 

Passband:  0-0.4  (in  terms  of  Digital  Frequency  /) 

Stopband:  0.45-0.5  (in  terms  of  Digital  Frequency  /  ) 

Order:  3 1 

The  frequency  spectrum  of  the  original  and  the  upsampled-by-two  signal 
is  shown  in  Figure  39.  We  can  verify  that  the  frequency  spectrum  of  the  upsampled-by- 
two  signal  is  squeezed  (in  terms  of  the  digital  frequency)  by  the  integer  factor  of  two 
compared  to  the  frequency  spectrum  of  the  original  signal.  Therefore,  the  frequency  of 
the  original  signal  is  /  =  0.4  while  the  frequency  of  the  upsampled-by-two  signal  is 
/  =  0.4/2  =  0.2  (where  /  is  the  dimensionless  digital  frequency). 
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Figure  39.  Frequency  Spectrum  of  the  Original  and  Upsampled  by  Two  Signal. 

C.  INTERPOLATION  BY  AN  INTEGER  FACTOR  ‘D’  WITH  FIR  MAC  AND 
POLYPHASE  DECOMPOSITION 

The  structure  introduced  for  the  interpolation-by-two  operation  can  be  easily 
extended  to  a  more  general  interpolation-by-  D  operation  for  any  D  >  2  .  The  interpolated 
signal  y  [«]  obtained  from  an  input  signal  x\n\ ,  which  is  upsampled  by  an  integer  factor 

D  and  then  filtered  by  a  FIR  filter  h[n]  (decomposed  into  its  polyphase  components)  is 
given  from  the  following  expression: 


JV— 1 

y[Dn]  =  Yjh0[k]x[n-k], 

k= 0 

N- 1 

y  [Dn  +  l]  =  YJhl[k]x[n-k], 

k= 0 

•  •  • 

N- 1 

y  [Dn  +  D  - 1]  =  Y,  hD- 1  [k]x  [n  -  k) , 

k= 0 


(4.3) 
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with  D  integer  and  x[n]  =  x(nTs)  ,  y\ri\  =  y(nTJ  D)  the  input  and  output  sequences 
sampled  at  rates  Fs  =  1  /  Ts  and  DFs  =D/Ts  respectively. 

The  D  polyphase  components  of  the  FIR  filter  are  defined  as: 

ht[k]  =  h[kD  +  £]  (4.4) 

with  £  =  0,...,D-1  and  k  =  0,..., N -\ . 

The  simulation  needed  to  perform  the  interpolation-by-  D  operation  is  similar  to 
the  interpolation-by-two  case.  The  memory  vector  for  the  input  data  points  in  the  Dual 
Port  Ram  has  length  DN  and  it  is  updated  as  ^\_(n)DN ]  <—  x[«D] .  The  data  address 

counter  is  defined  as  («)  =  0,...,DN -D-l  and  it  is  repeated  D  times  during  the  input 

interval  Ts  . 

The  FIR  filter  coefficients  vector,  which  is  stored  in  the  second  memory  of  the 
Dual  Port  Ram,  is  made  of  the  polyphase  components  derived  from  expression  (4.4)  with 
total  length D(N  +  1) . 

The  input  signal  is  sampled  at  a  rate  F  ,  while  System  Generator  (Sysgen)  works 
at  a  higher  sampling  rate  equal  to  D  ( /V  +  1 )  F  .  Therefore  the  input  data  is  upsampled  by 
the  integer  factor  of  Z)(7V  +  l)  with  the  corresponding  Xilinx  block 

The  time  representation  of  ‘data  address’  and  ‘coefficients  address’  sequences  are 
shown  in  Figure  40.  In  particular,  at  time  {Dn-  D)Ts  the  accumulator  is  initialized  as 

a{^Dn-D)Ts^  =  0  (where  ‘  a  ’  denotes  the  accumulation  function).  At  every  subsequent 
clock  cycle  Tac  =  — —  the  accumulation  will  be  updated  by 
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a[iyDn-D^jTs  +  ATar  j  =a{(  Dn  -  D)  T  + ( A — l)  Tac  j + [data  _  addr  ( X)»coeff  _  addr  ( /.)  j , 


a(^Dn-£)Ts  +ATac'j  =  a(^Dn-£)Ts  +(yA,-\)Tai^+(data  _addr{KX)»coeff  _addr[X^, 


a(^Dn-l)Ts  +  ^Tac)  =  a([Dn -1)7]  +(A-l)7^.)  +(data  _addr(X)»coeff  _addr(yX^j , 

where  /l  =  1  .  The  outputs  y[Dn-D]  ,...,y[Dn ]  are  shown  in  the  timing  diagram  of 

Figure  40. 


Tinning  Diagram 


input  data  x  n—  l] 


input  data  x[«] 


Data  addr  (n-l)^  (nCN)m.  .  (n-EN)m. .  .  .  (niJ^. .  ,(nl)Jn)m 

Coeff  addr  0  ENH  . .  ,(£>-l)Nt-2  £N  .  .  .  (^-1)NH.  .  .  1  0 

Phase  0  N...  1  0...0  N...  0...1  0 


~-TacA 

time 

Dn-D  *  #  #  Dn-i 


Figure  40.  Timing  Diagram  for  Interpolation  by  D. 

Apart  from  the  new  vectors  that  are  stored  in  the  Dual  Port  Ram,  this  design  can 
be  obtained  by  simple  extension  of  the  interpolation-by-two  case  to  the  more  general 
interpolation-by-D. 

Referring  to  Figure  7,  the  bitstream  outcome  P  of  the  DSP48  can  be  considered 
as  a  set  of  blocks  of  length  N  + 1  in  which  the  desired  coefficients  of  the  interpolation- 

by-D  operation  are  embedded  in  every  (A  +  l),A  element  of  each  block  as  it  shown  in  the 


55 


timing  diagram  in  Figure  40.  Therefore  by  downsampling  the  data  P  by  the  factor  of 
N  + 1  (  D  times  less  the  factor  which  was  used  when  the  input  data  was  upsampled)  the 
desired  interpolation-by-  D  operation  is  performed. 

In  order  to  test  the  performance  of  the  simulation  for  interpolation  factor  ,d  =  4  a 
sinusoidal  waveform  with  frequency  F  =  0.4  •  Fs  (Hz)  and  sampling  frequency  Fs  =  1 
(Hz)  is  applied  as  an  input. 

The  FIR  filter  has  been  designed  as  an  Equiripple  filter  and  decomposed  into  four 
polyphase  components  with  the  following  characteristics: 

Passband:  0-0.2  (in  terms  of  Digital  Frequency  /  ) 

Stopband:  0.25-0.5  (in  terms  of  Digital  Frequency  /  ) 

Order:  120 

The  frequency  spectrum  of  the  original  and  the  upsampled  by  0  =  4  signal  is 
shown  in  Figure  41.  We  can  verify  that  the  frequency  spectrum  of  the  upsampled  by 
D  =  4  signal  is  squeezed  (in  terms  of  the  digital  frequency)  by  the  integer  factor  of  four 
compared  to  the  frequency  spectrum  of  the  original  signal.  Therefore,  the  frequency  of 
the  original  signal  is  /  =  0.4 ,  while  the  frequency  of  the  upsampled-by-four  signal  is 
/  =  0.4/4  =  0.1  (where  /  is  the  dimensionless  digital  frequency). 
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Figure  41.  Frequency  Spectrum  of  the  Original  and  Upsampled  by  D  =  4  Signal. 
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V.  CONCLUSIONS 


A.  SUMMARY  OF  THE  WORK 

In  this  research,  we  presented  an  architecture  for  implementing  resampling 
operations  in  FPGAs.  The  particularly  interesting  feature  of  this  approach  is  the  use  of  a 
specific  functional  block  (the  DSP48),  which  is  optimized  for  DSP  applications  in  real 
time.  Although  a  number  of  applications  are  possible,  this  approach  is  particularly 
attractive  in  the  implementation  of  Software  Defined  Radios  (SDR). 

Three  classes  of  DSP  operations  have  been  implemented  software  in  the  Simulink 
design  environment: 

1)  Finite  Impulse  Response  filter 

2)  Decimation  by  an  integer  factor 

3)  Interpolation  by  an  integer  factor 

All  subsystems  of  the  design  are  fully  accessible  by  the  designer  at  every  stage. 

The  key  ingredient  was  the  use  of  a  Multiplier  and  Accumulator  (MAC) 
architecture  carried  out  from  the  DSP48  slice,  which  is  an  efficient  Xilinx  block  (from 
the  Simulink  library)  for  many  DSP  applications.  The  dual  input  fed  to  the  DSP48  was 
provided  from  the  Dual  Port  Ram  Xilinx  block,  which  is  a  memory  device  that  allows  the 
user  to  specify  the  width  and  the  values  for  each  memory  part  in  order  to  perform  the 
three  above  mentioned  operations. 

The  Xilinx  System  Generator  was  used  to  realize  the  software  perfonnance  to  a 
Virtex-4  FPGA,  increasing  the  computation  data  rate  according  to  each  case. 

MATLAB  code  was  used  to  generate  the  FIR  filter  and  its  polyphase 
decomposition  in  the  design  and  also  to  verify  the  perfonnance  providing  the  desired 
results  in  terms  of  plots  demonstrating  the  conesponding  theoretical  perspective. 
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B.  SUGGESTION  FOR  FUTURE  WORK 

The  designs  presented  in  this  thesis  will  be  part  of  a  general  Software  Defined 
Radio  (SDR)  implementation.  In  particular  it  will  be  interlaced  with  both  modulation  and 
demodulation  processes,  so  that  the  whole  radio  will  be  implemented  in  software. 

There  are  several  issues  to  be  addressed.  The  most  important  is  whether  this 
approach  can  be  implemented  in  real  time  using  a  reasonable  amount  of  chip  “real 
estate”.  In  order  to  address  this  problem,  higher-level  language  code  needs  to  be  used  to 
implement  the  algorithm  on  the  chip. 

This  is  part  of  an  ongoing  research  project. 
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